Skip to content

fix: handle context length too large#311

Merged
johnwalz97 merged 4 commits intoprodfrom
john6797/sc-8457/experian-gets-an-error-on-too-large-dataset-prod
Feb 11, 2025
Merged

fix: handle context length too large#311
johnwalz97 merged 4 commits intoprodfrom
john6797/sc-8457/experian-gets-an-error-on-too-large-dataset-prod

Conversation

@johnwalz97
Copy link
Contributor

@johnwalz97 johnwalz97 commented Feb 11, 2025

Internal Notes for Reviewers

Hotfix for issue with context length exceeded errors.

External Release Notes

@johnwalz97 johnwalz97 added bug Something isn't working internal Not to be externalized in the release notes labels Feb 11, 2025
Copy link
Contributor

@cachafla cachafla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great 👌

@github-actions
Copy link
Contributor

PR Summary

This pull request introduces a new function _truncate_summary in the validmind/ai/test_descriptions.py file to handle the truncation of summaries that exceed a specified maximum token length. The function uses the tiktoken library to encode the summary and ensure it does not exceed the max_tokens limit, which is set to 100,000 by default. If the summary exceeds this limit, it is truncated, and a warning is logged to indicate that the generated description may be inaccurate.

Additionally, the generate_description function has been updated to use _truncate_summary for the summary field, ensuring that summaries are appropriately truncated before being returned.

The error handling in the wrapped function has been improved to provide more specific warnings when a test result is too large to generate a description, and a default description is used instead.

Minor import reordering has been done in validmind/tests/run.py to adhere to standard import ordering practices.

Test Suggestions

  • Test the _truncate_summary function with summaries of varying lengths to ensure it correctly truncates when necessary.
  • Verify that the generate_description function integrates _truncate_summary correctly and returns truncated summaries when appropriate.
  • Check the logging output to ensure that warnings are logged correctly when summaries are truncated or when default descriptions are used.
  • Test the error handling in the wrapped function to ensure it logs the correct warnings for different types of exceptions.

@johnwalz97 johnwalz97 merged commit da7bdcb into prod Feb 11, 2025
6 checks passed
@johnwalz97 johnwalz97 deleted the john6797/sc-8457/experian-gets-an-error-on-too-large-dataset-prod branch February 11, 2025 18:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working internal Not to be externalized in the release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants